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Abstract 

We apply a recent proposal to speed up the Hybrid-Monte-Carlo simu¬ 
lation of systems with dynamical fermions to two flavour QCD with clover- 
improvement. The basic idea of our proposal is to split the fermion matrix 
into two factors with a reduced condition number each. In the effective ac¬ 
tion, for both factors a pseudo-fermion field is introduced. For our smallest 
quark masses we see a speed-up of more than a factor of two compared 
with the standard algorithm. 


1 Introduction 


It is clear that with simulation algorithms that are used today it will be very 
difficult, if not impossible, to reach the physical values of lightest quark masses. 
The scaling behaviour for Wilson fermions (see |I|, |^, ^ of the algorithms 

predicts enormous costs for simulations at quark masses as light as the u- and 
d-quarks. To reach this physical point, extrapolations using chiral perturbation 
theory (yPT) have to be used. However contact to yPT seems to be happening 
at rather small values of the quark masses themselves (see ref. 0 and refs, 
therein). Any progress to render simulations easier, when approaching the small 
quark mass regime will help therefore to reach the overlap region between yPT 
and lattice QCD allowing for a safe extrapolation to physical quark masses. 

Still the HMC (Hybrid-Monte-Carlo) algorithm 0 and its variants [|, ||, 
are the methods of choice in simulating lattice QCD with dynamical Wilson 
fermions. Recent large scale simulations with two flavours of Wilson fermions 
and standard boundary conditions |^, [T^ only reach a ratio of m-^/mp ^ 0.58, 
while the physical point is given by m.^/mp ~ 0.18. Note that the decay of the 
p-meson into two vr-mesons is only possible if rriT^/mp < 0.5. In refs. 




explorative studies with m-„/mp ^ 0.4 were reported and it was found that the 
simulations become substantially more expensive. Going to light quarks, the 
HMC algorithm becomes increasingly expensive for at least two reasons: First 
the condition number of the fermion matrix increases. Hence the number of 
iterations needed by the solver employed increases. Secondly, but maybe related, 
the step-size of the integration scheme has to be decreased to maintain a constant 
acceptance rate. In fact, in ref. a step size as small as 1/400 is needed for 
the leap-frog integration scheme. 

In ref. we have demonstrated that the obstacle of very small step sizes 


at low values of the quark mass can be lessened by a modihcation of the pseudo¬ 
fermion action. The proposal is to split the fermion matrix into two factors and to 
introduce a pseudo-fermion held for both factors. Each of the corresponding two 
matrices has a smaller condition number than the original fermion matrix. As a 
consequence the “fermion force”, i.e. the response of the system on a variation 
of the gauge held, is substantially reduced. The numerical study of the two 
dimensional Schwinger model showed that the step-size can be enlarged and thus 
the computational ehort can be reduced substantially this way. In ref. 


16 


we 


presented hrst results for lattice QCD with two havours of Wilson fermions and 
clover improvement |^. In the present paper we extend this study towards larger 


lattices and smaller quark masses. Also, we compare two diherent factorisations 
of the fermion matrix. We simulated at /3 = 5.2 on 8^ x 24 and 16^ x 24 lattices. 
To allow a comparison with the literature, we have taken the values for the 
parameters Cgw and k from ref. |]^. Our largest value of k corresponds to 
m-j^/mp = 0.698. Results of simulations with Schrodinger functional boundary 
conditions are reported in ref. |T^. A brief summary of the present work can be 
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found in lEO 


The paper is organised as follows: In section 2 we briefly recall the dehni- 
tion of clover-improved Wilson fermions [^, 0. As basic improvement we 

employ even-odd preconditioning as discussed in 
the modihed pseudo-fermion action [|^ and a variant of it |T^ 


Next we discuss in detail 
We explain how 

(easily) the HMC algorithm can be adopted to the modihed pseudo-fermion ac¬ 
tion. It follows a thorough discussion of the integration schemes that we have 
used. In section 4 we give the details of our simulation. Next we present our 
numerical results and discuss its implications. Finally we give our conclusions 
and an outlook. 


2 The modified pseudo-fermion action 


In this section we remind the reader of the action of clover-improved Wilson 
fermions. For completeness, we briehy recall even-odd preconditioning and the 
standard form of the pseudo-fermion action that is used in the HMC simulation. 
Then we show how the modihed pseudo-fermion action that was proposed in ref. 
pTfl can be generalised to clover-improved fermions. In addition to the original 


proposal, we consider an alternative that is inspired [^] by twisted mass QCD 
and was hrst presented in ref. |^. Finally, we explain how the HMC algorithm 
can be adopted to the modihed pseudo-fermion action. 


2.1 The model 

Our aim is to simulate the system dehned by the partition function 
Z = j D[H]exp(-^G[f/]) detM[f/]2 , 
where the Wilson plaquette action is given by 

= -f E E Tr 


( 1 ) 


( 2 ) 


X 


where x are sites on a hyper-cubical lattice, p, G {0,1,2,3} are directions on 
the lattice and /i is a unit vector in /i-direction. In eq. (|^, the fermion degrees of 
freedom have been integrated out. For Wilson fermions with clover-improvement, 
the fermion matrix M is given by 


^\U\xy I 1 2 ^ yv ^ ) ^x,y 


Uy,{x) ^x+jl,y + (1 + 7/.) , ( 3 ) 
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where we sum over /i and u. The anti-symmetric and anti-Hermitian tensor JF is 
given by 

= ^[U^{x)U^{x +fi)Ul{x + i))Ul{x) 

+ Uy{x)Ul{x + P - fi)ul{x - jx)U^{x - jl) 

+ Ul{x - fi)Ul{x - z> - fi)U^{x - z> - fi)U^{x - P) 

+ Ul{x - P)U^,{x - P)Uy{x - P + jl)Ul{x) 

- h.c] . (4) 

For a discussion of on-shell 0(a)-improvement see e.g. refs. B 0 .SSS- 
In our simulations we have used even-odd preconditioning throughout. We 
followed the proposal of ref. [^, see also 0 . The fermion matrix can be written 
as 

^ _ f lee + ^ee 

y —nMoe loo + Too 

where we have introduced the matrix TeeiToo) on the even (odd) sites as 

• ( 6 ) 

The off-diagonal parts Meo and Moe, which connect the even with odd and odd 
with even lattice sites, respectively, are just the conventional Wilson hopping 
matrices. The determinant of the fermion matrix can now be written as 

detM oc det(lee + Tee) detM , (7) 

where 

M = loo + Too ~ Tfoe(lee + Tee) ^Meo ■ (8) 

In the following discussion we shall refer to the Hermitian matrix 

Q = C 075 M (9) 

with Co a constant set to cq = 1 throughout this work. 

In the standard HMC simulation of mass-degenerate two-flavour Wilson fermions, 
the effective action 

Seff[U, 0 ] = Sg[U] + S,et[U] + Sf[U, 0 ] , (10) 

with 

Sdet[U] = -2Trlog(l + Toe) 

SF[U,cj)\<P] = ( 11 ) 
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is used. We shall refer to Sdet\U] as the determinant contribution. The pseudo¬ 
fermion action Sf\U, 0] is based on the representation 


detQ^ oc 


D0^ 


D0exp(—^0) 


( 12 ) 


of the square of the fermion determinant. In our study we keep S'g'[ 17] and Sdet\U] 
in their standard form. However, Sf[U, 00 0] is replaced by modihed expressions 
to be discussed below. 

As an alternative, the authors of ref. 
ment of even and odd sites. The result is 


231 suggest a more symmetrical treat- 


detM (X det{lee + Tee)det{loo + Too)detMsym , (13) 

where now 

Mgym = loo ~ (loo + Too) ^Afoe(lee + T^e) ^M^o ■ (14) 


The authors of ref. |]28| hnd that the choice of eq. (^) leads to a roughly 30% 


higher performance of the HMC algorithm than the choice of eq. ( 0 )- 
Since we wanted to compare our results with those of ref. 


18 and we started 


with our study before ref. appeared, we have used only eq. (|^) in this study. 
However, it is straight-forward to apply our modihcation to eq. (|T^ and it can 
be expected that a similar gain should be found. 


2.2 The modified pseudo-fermion action 


Our starting point of modifying the action of eq. (pT]) is the observation that, at 
hxed acceptance rate and length of the trajectory, the step-size of the integration 
scheme has to be decreased with decreasing sea-quark masses and hence with 
increasing condition number of the fermion matrix. This effect can be nicely seen 
e.g. in table 11 of ref. |TT . 

Also, a number of studies (see e.g. |2^, ^ ^) have shown that replacing 


the original fermion matrix by a preconditioned fermion matrix in the pseudo¬ 
fermion action allows for a larger step-size in the HMC simulation at the same 
acceptance rate. 

Based on these observations, one of us proposed to factorise the fermion 
matrix into two parts, with reduced condition number each. The determinant of 
both factors is estimated by pseudo-fermion helds: [| 


detg^ = detWW^ det[W-^Q][W-^Q]^ oc / D0l / D0i 


D0t 


D02 


exp ( -4 {ww^y' .#.1 - 4 ([w'-'oi ‘ 4>2 


(15) 


^In refs. [BO, 32 we applied a similar modification of the pseudo-fermion action to speed up 


the local updating of lattice QCD. 
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Note that W might be non-Hermitian. In the following we shall use the notation 


, Sf 2 = 4{V^'~"QW~"Q]'y'4" 


(16) 


In ref. we also considered even-odd preconditioning, but no clover-improvement. 
We constructed W by a shift in the hopping-parameter k: W = 75 M, where 


M = 1 - 


oe^^-^-eo 1 


(17) 


with K < K. Since the fermion matrix can be multiplied by a constant factor 
without changing the physics, we can write equivalently 


W = Q + pti, 


(18) 


This is the form that we apply to the even-odd preconditioned, clover-improved 
Q of eq. (i- In ref. we have demonstrated at the example of the two- 
dimensional Schwinger model that the step-size of the integration scheme can 
indeed be increased compared with the standard pseudo-fermion action. This 
gain increases as the sea-quark mass decreases. For the lightest mass that we 
studied, the step-size could be increased by more than a factor of two for the 
optimal choice of k. 


In addition to the original choice (pISD of W we shall consider 

IT = Q -\- ipl. , 


(19) 


which is inspired by twisted mass QCD ||^ and was hrst tested in ref. [|6 


In terms of Q the resulting pseudo-fermion action is 5^ = S'^i -|- 5^2 with 
S FI = + fTf , Sf2 = (pl[t +fn5Q~^][^ +PQ~^'l5]4>2 ( 20 ) 


for eq. ([iq) and Sp = Spi + Sf 2 with 

Spi = + P^t] ^01 , Sf 2 = (l>2[^ + P^Q 


( 21 ) 


for eq. ([I^) . 

The aim of our modihcation is to use matrices with reduced condition number 
in the pseudo-fermion action. Let us denote the smallest and the largest eigen¬ 
value of by Xmin and X^ax, respectively. If we choose Xmin < < X^ax, 

we hnd the condition number of + p^ to be Xmax/P^- On the other hand, the 
condition number of the second matrix used in Sf 2 of eq. (^) becomes p^/Xmin- 
This suggests an optimal choice of 0 


P 


= ^ 


max^min 


( 22 ) 


^We are grateful to R. Sommer for this argument. 
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In the simulations with the pseudo-fermions using the action Sp oi eq. , as 
described below, we followed this rule. The obvious disadvantage of this choice 
is that the eigenvalues Amm and Xmax have to be computed to a reasonable accu¬ 
racy before the simulation with the modihed pseudo-fermions can be done with 
optimised choices of the algorithm. Also it is not clear, how much the detailed 
distribution of the eigenvalues of matters for the step size of the integration 
scheme used. 

Therefore we did not use the rule of eq. in the simulations with the action 
of eq. (1^) . Instead, we tried to hnd numerically the value of p, where for hxed 
step-size the acceptance rate is the largest. 

As a last remark in this section we want to mention that with the choice of 
of eq. (p^ it seems that the condition number k of the original action, eq. (11), 
is reduced to \/k when using the action given in eq. (0). If hence the scaling 
behaviour of the HMC algorithm is indeed determined to a large extend by the 
condition number of the fermion matrix used, a much better scaling behaviour 
can be expected when the chiral limit is approached. 


2.3 The Hybrid-Monte-Carlo Algorithm 

For completeness, we recall the steps of the Hybrid-Monte-Carlo algorithm [0 
applied to the standard pseudo-fermion action of eq. (0). One elementary update 
(“trajectory”) of the HMC algorithm is composed of the following steps: 

• Global heat-bath of the pseudo-fermions and the conjugate momenta. 

• Molecular dynamics evolution of the gauge-held and the conjugate momenta 
P with hxed pseudo-fermions. 

• Accept/Reject step: the gauge-held U' that is generated by the molecular 
dynamics evolution is accepted with the probability 

Pace = min[l, exp(-hf (C, P\ 0) + H{U, P, 0))], 

where P' represents the conjugate momenta generated in the molecular 
dynamics evolution. 

The last step is needed, as the molecular dynamics evolution can not be done 
exactly and hence requires a numerical integration scheme, rendering API = 
H{U,P,(j)) — H{U',P',(j)) 7 ^ 0. To obtain a valid algorithm, this scheme has to 
be area preserving and reversible. Details on the numerical integration schemes 
that we have studied are given in section 
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2.4 Heat-bath of the pseudo-fermion fields 

The field is initialised at the beginning of the trajectory as usualQ: 

(j)i = Wrii , (23) 

where rji has a Gaussian distribution. The heat-bath of 02 however requires the 
application of the inverse of W: 


(j)^ = W-^Qr]2 , (24) 

where also 772 has a Gaussian distribution. 

2.5 The variation of the pseudo-fermion action 

An important feature of our approach is that the variation of the modified pseudo¬ 
fermion action can be computed as easily as in the standard case. For the first 
part of the pseudo-fermion action we obtain upon a variation of the action with 
respect to the gauge fields 

6Sfi = -X^ 6WY -Y^ 5W^ X (25) 

with the vectors 

X={WW^Y^ (t)i , F = , (26) 

which is essentially the same as for the standard pseudo-fermion action. The only 
difference is that Q is replaced by W. 

For the second part of the pseudo-fermion action we get 

5Sf2 = -X^ 5QY -Y^ 5Q^ X + XUhF 02 + Y ^ (27) 

with the vectors 

77 = (q)"' hF02 , Y = Q-^W<j)2 . (28) 

For our choices of W in eq. (|T^ (and of W in eq. (|T^) the variation of Sf 2 
(and Sf 2 ) further simplifies and it holds 6W = 6Q (= 6W). We therefore get 

6Sf2 = -X^ SQY -YUQ X , (29) 

where 

X = Q-'[l + pg-S]02 , Y = pQ-Y5h (30) 

^Here we discuss only the case of the action Sp in eq. (|^). The case of the action Sp in 
eq. (M) is basically identical. 
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(31) 


for the action in eq. (]20|) and 


X = p^Q-^ct)2 , Y = Q-^ 


for the action in eq. (|^) . 

Let us discuss the effect of introducing the parameter on the variation of the 
pseudo-fermion actions. Since the step size used in the HMC algorithm increases 
as the condition number decreases, we expect that we can choose a larger step 
size for the action Spi. For the action Sp 2 we see from eq. (^) that its variation 
includes a factor of p^. Since -C 1, we expect again smaller variations of the 
action along a trajectory leading to larger choices of the step size. A similar 
argument holds for the action Sp oi eq. 


3 Integration schemes 

We have tested two different integration schemes: The standard leap-frog and a 
partially improved one suggested by Sexton and Weingarten (see eq. (6.4) of ref. 
Q) that still has an error as the leap-frog scheme. However the amplitude 


of this error is considerably reduced. 

Let us dehne the update of the gauge-held and the momenta as 


Tu{5t) 

Tp{5t) 


U 

P 


e 

P 


iSr P jj 

iSt 6u(S,(U) + S,(U)) 


(32) 

(33) 


where 5u denotes a variation with respect to the gauge helds. 
The elementary step of the leap-frog algorithm is given by 

Uir)^Tp{P\ Tu(6t)Tp{^ 


(34) 


A trajectory is composed of Nmd consecutive elementary steps. Here we use 
trajectories of length 1, i.e. 6t = 1, as it is done in most HMC simulations. 
Note that the order of the updates of momenta and gauge-helds is not unique. 

m 


In fact, in ref. 


it was demonstrated that the alternative order 
n(Sr) = Tv TpiSr) Tv 


(35) 


achieves the same acceptance rate as eq. (^) (see hg. 2 of ref. with a 

roughly 15% larger step size 5t. 

Since we like to compare our numerical results with those of ref. 


18 we have 


used the order of eq. (^) in our numerical study. Also it is not clear to us how 
the idea of a split time-scale introduced in ref. can be applied to eq. (|55D. 

In the literature ^ also higher-order schemes are discussed. These 

schemes become increasingly complicated as the order increases. Recent studies 
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[pj] , |3^ , |T7| show that higher-order schemes become less efficient than the simple 
leap-frog integration scheme of eq. (^4|) as Kc is approached. 

In addition to the genuinely higher-order schemes, Sexton and Weingarten [Q 
discuss a scheme with reduced errors. In the numerical tests, using Wilson 
fermions, on a 4"^ lattice at /? = 5.4 and n = 0.162, Sexton and Weingarten 
found that this scheme achieves the best performance. The elementary step of 
this scheme is given by 


T4Sr) = T, [ tJ l^r 


6t 


Tu(-]Tp(- 


6t\ 


(36) 


Note that in an elementary step of this scheme, the variation of the action with 
respect to the gauge-helds has to be computed twice. Comparing the efficiency 
of this scheme and the leap-frog we have to keep this factor of two in mind. 

In our study we are primarily interested in the effect of the pseudo-fermions 
on the step-size of the integration scheme. Therefore we have employed the split 

H, 


of the time-scale proposed by Sexton and Weingarten 


This, in practice. 


allows to eliminate completely the effect of the gauge action on the step-size. 

The update of the momenta is split into two parts: One with respect to the 
gauge action and one with respect to the pseudo-fermion action; 


Tpg{St) ; P^P-i5t 5uSg{U) 
Tpf{6t) : P^P-idr SuSf{U) 


The leap-frog scheme is now generalised to 
' 6t 


T2(n,5r) = T, 


PF 


I £) (?) (? 


T; 


PF 


6t 


The improved scheme in eq. is generalised to 


Un,ST) - rpFlyi x(^\ rppCid x(^\ 


where 


A'(ip) = r,«lT) r„(f) T.a {Isr) T„(f) Tp„(f) 


(37) 

(38) 


(39) 


(40) 


(41) 


In our study we have used n = 4 throughout. We have tested that larger 
values of n hardly further increase the acceptance rates. 


4 The Numerical Study 

We have tested our modihed algorithm at parameters that had been studied by 
UKQCD before []^. We have performed simulations at /3 = 5.2 and Csw = 1.76. 
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Note that Csw = 1-76 was a preliminary result for the improvement coefficient, 
while the hnal analysis resulted in Cgw = 2.0171 for (3 = 5.2 p2[. We applied 


periodic boundary conditions in all lattice directions, except for anti-periodic 
boundary conditions in time-direction for the fermion-helds. 

As a hrst step we have studied a 8^ x 24 lattice at k = 0.137. Later we simu¬ 
lated a 16^ X 24 lattice at k = 0.139, 0.1395 and 0.1398. Following the tables 4.4 
and 4.6 of ref. |T^, these values of k correspond to m^/rup ~ 0.856,0.792,0.715 
and 0.686, respectively. From table 4.8 of ref. 
an error of about one in the last digit. 


18 


we read that Kc ~ 


0.1405 with 


4.1 Details of the implementation 

We have simulated the two modihed actions with independent programs. The 


program for the action S'^ in eq. (pT]) was written in TAO and was executed on 


a APEIOO computer. The program for the action S'^ in eq. (|^) was written in 
C with sequences of assembler code to take advantage of the SSE2-instructions 
of the Pentium 4 CPU (See ref. [Q). Since the implementation in the two cases 
differ in many respects, we give separate discussions of the details below. 


4.1.1 The action Sp 

Here we have used double-precision floating point arithmetic throughout. We 
have implemented the BiCGstab algorithm to solve the linear equations 
(p4| , p6| , |30D and to compute the action at the end of the trajectory. 

As start vector we have always used the zero-vector. The square of the residual 
is given by 

r2 = , (42) 

where is the approximation of the solution after the iteration. Note that 
we did not compute using eq. (|4^) directly, but used instead the iterated result 
of the BiCGstab algorithm. The iteration is stopped after becomes smaller 
than a given bound. For the heat bath of eq. (p^) of the held <p 2 and to compute 
the action at the end of the trajectory we have required < 10“^*^. 

As long as the start vector does not depend on the history of the trajectory, 
the correctness of the HMG algorithm does not rely on the accuracy of the vectors 
X and Y of section |2.5| . However, if the accuracy becomes too low, XH becomes 
large and hence the acceptance rate low. 

The vectors X and Y for Spi we computed as follows: 


where we required 


MY = 7501 , 




(43) 

(44) 
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as stopping criterion, where is a certain cnt-off, the choices of which are listed 
below. For the solntion X of 


MX = 75^ 


(45) 


we required < R^. Here we followed the recommendation of ref. |]I^. Since 


the error of Y propagates to X, we can not reach the same accuracy of X as of 
Y. Therefore it is reasonable to require a higher accuracy for eq. (|43|) than for 
eq. dH). 

In a similar fashion we compute the vectors X and Y for the second part of 
the pseudo-fermion action Sf 2 '- 


MF = p02 , 


(46) 


where we require < O.Oli?^. The vector X is then obtained through 


MX = 75(02+ H) 


(47) 


with the stopping criterion < R^. 

Unfortunately, there is also a subtle effect of the accuracy of X and Y on 
the reversibility of the algorithm. For a discussion see refs, |^, ^ and 

refs, quoted therein. We performed a few checks to investigate this problem. 
We hnd that the amplitude of the reversibility violations is much the same for 
the leap-frog as for the Sexton-Weingarten improved scheme. Also it does not 
depend much on the parameter p of the modihed pseudo-fermion action. On 
the other hand we see, in accordance with the literature, that the reversibility 
violations become much worse when we compute X and Y with reduced accuracy. 
Going a trajectory of length 1 forward and backward again, we see for k = 0.1398 
violations in the Hamiltonian that are smaller than 10“® for R? = 10“^°. On the 
other hand, for R? = 10“'^ the violations become as large as 10“^. Nevertheless, 
these reversibility violations are still small enough, not to invalidate the results 
reported below. 

In the case that the BiCGstab algorithm fails to converge, which never hap¬ 
pened in our study, X and Y have to be computed with the conjugate gradient 
algorithm: 

For Spi we hrst compute X as the solution of 



W^X = 01 

(48) 

followed by 

Y = IVX . 

(49) 

For Sf 2 we first compute X 

as the solution of 



Q^X = (Q -h P75)02 

(50) 

followed by 

Y = QX-(t)2 . 

(51) 
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4.1.2 The action Sp 


These simulations were performed with single precision arithmetic throughout. 
As solver we have used the conjugate gradient algorithm. As start vector we 
have used the solution of the previous step. The vectors X and Y for Spi are 
computed as follows: First we solve the linear equation 


+ p^]x — (j)i 

(52) 

followed by 

Y = lU^X . 

(53) 

For Sf 2 we solve 

Q^x = p^^i 

(54) 

followed by 

Y = p-'^W^X . 

(56) 


For X we tried to reach essentially the machine precision. Thus, as stopping 
criterion we required that the square of the iterated residual divided by the square 
of the norm of X is smaller than 10“^. 


4.2 Analysing the Performance of the Algorithm 

The goal of the optimisation of an algorithm is to obtain a given statistical error 
for the observable one wants to compute with a minimal amount of CPU time. 
To this end, the ALPHA-collaboration has studied in a benchmark of various 
algorithms [Q the quantity 


Mcost — 


CPU-time 
(stat. error) ^ 


(56) 


The practical problem with this dehnition is that it requires rather large statis¬ 
tics to obtain reliable results for the integrated auto-correlation time Tint and 
correspondingly for the statistical error. On the other hand it requires only a 
rather small number (say 100) of trajectories to obtain an accurate estimate of 
the acceptance rate. Therefore we shall base our study on the hypothesis that for 
a hxed length of the trajectory and a hxed acceptance rate, the auto-correlation 
times are independent on the parameter p of the modihed pseudo-fermion action 
and the integration scheme that is used. In fact, this hypothesis is backed up by 
the simulations of the Schwinger model in ref. |T^ and the simulations of the 
8^ X 24 lattice presented below. 

Following this hypothesis, we are looking for the parameter p that allows for 
the largest step-size (i.e. minimal CPU-cost) at a given acceptance rate. This 
requires however hne tuning of for each parameter of p. Instead we have kept 
the step-size 5t hxed and have searched for the value of p with the maximal 
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acceptance rate. The number of iterations needed to solve the linear equations, 
given in eq. (p6|), depends on p. On the other hand the linear equations, given in 
eq. (^), always involve the matrix Q and hence the number of iterations does not 
depend on p. Since the number of iterations for eq. (1^) is, at least for k —^ Kc, 
much larger than that for equations eq. (pb]) we shall ignore the dependence of 
the total iteration number on p and concentrate on the acceptance rate. 


4.3 Simulation results 

4.3.1 Simulations of the 8^ x 24 lattice 

As a hrst step, we have studied quite extensively the 8^ x 24 lattice at k = 
0.137. First we searched for the optimal value of p for the action Sp in eq. (^Ol) . 
After thermalising the system, we performed runs with 200 trajectories each for 
a range of values for p. We have hxed the step size 5t = 0.02 for the leap-frog 
scheme and 5t = 0.05 for the Sexton-Weingarten improved scheme. The results 
for the acceptance rate are summarised in hgure |^. In order to determine the 
acceptance rate, we have averaged min[l, exp(—Aif)] instead of counting the 
accepted conhgurations. In this way the statistical error of the acceptance rate 
is considerably reduced. 

We hrst observe that the acceptance rate with the modihed pseudo-fermion 
action is indeed higher than for the standard action (p = 0). For both integration 
schemes we see a rather broad maximum of the acceptance rates and the maximal 
acceptance is reached for p 0.5. 

In order to compare the efficiency of the various approaches, we have per¬ 
formed more extended runs for p = 0 and p = 0.5 with the action S'p, eq. (pOf ), 
and for the action 5^, eq. ( 0 )- In the case of we have chosen p following the 
rule given in eq. (^). In these extended runs we have chosen the step-size 5t 
such that Pace ~ 0.8. The parameters of these runs and the acceptance rates are 
summarised in table 2. In the case of the leap-frog scheme the step-size can be in¬ 
creased by a factor of about 1.6 by using the modihed pseudo-fermion action Sp, 
eq. (^Ol). In the case of the Sexton-Weingarten improved scheme, we see a similar 
increase of the step-size by a factor of approximately 1.7. The pseudo-fermion 
action Sp seems to perform slightly better than the action Sp. Comparing the 
leap-frog and the Sexton-Weingarten improved scheme, we see a small advantage 
for the Sexton-Weingarten improved scheme, independent of the pseudo-fermion 
action that is used. 

In table ^ we give our results for the plaquette and the minimal and maximal 
eigenvalue of for the runs with the action Sp of eq. (|20|) . The observed consis¬ 
tency of the results for these observables among the three runs gives us conhdence 
that the modihed pseudo-fermion action has been correctly implemented in the 
program. Also the result P = 0.49405(34) for the plaquette given in ref. JlS 


is in reasonable agreement with ours. From the run with the action Sp we get 
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Figure 1: Results for the acceptance rate of the leap-frog scheme (circle) and the 
Sexton-Weingarten improved scheme (triangle) as function of the parameter p. 
The dashed and the dotted lines should only guide the eye. For the leap-frog 
scheme, we have hxed the step-size to 5t = 0.02 and for the Sexton-Weingarten 
improved scheme to 5t = 0.05. The simulations are performed on a 8^ X 24 lattice 
at /3 = 5.2, Csw = 1-76 and k = 0.137. The pseudo-fermion action Sp of eq. (^) 
is used. 
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Amin = 0.00584(4), which is also consistent with the resnlts obtained with the 
action Sp- 

For the rnns with the action Sp, eq. (^), we have compnted integrated anto- 
correlation times for the plaqnette and for Xmin- We have trnncated the snmma- 
tion of the anto-correlation fnnction at t = 80. Within error-bars the integrated 
antocorrelation times are the same for the three rnns. he. onr hypothesis that 
the antocorrelation times do not depend very mnch on the integration scheme 
and the form of the psendo-fermion action, as long as the acceptance rate is the 
same, is conhrmed. 

In table ^ we give the iteration nnmbers that are needed by the BiCGstab 
solver in the simnlations with the action Sp. We will denote by Ni the nnmber 
of iterations for the action Spi {Spi) and with N 2 the nnmber of iterations for 
the action Sp 2 {Sp 2 ). W constitntes the nnmerical overhead cansed by the extra 
part Spi (Spi) of the modihed psendo-fermion action. As can be seen in table ^ 
N 2 depends only little on p, as expected, since here the original matrix has to 
be inverted. Since most of the CPU-time is spent for the solver, the total nnmber 
of solver-iterations per trajectory is a good indicator for the nnmerical effort. 

We see that, despite the nnmerical overhead dne to the additional psendo- 
fermion held, a net advantage for the modihed psendo-fermion action remains, as 
demonstrated by the total nnmber of iterations given in the last colnmn in table ^ 
As we shall see below, this total nnmber of iterations can be fnrther rednced by 
choosing a larger valne of R^, see eq. (|^). However, this ahects the simnlation 
of the standard psendo-fermion action and the modihed psendo-fermion action 
in the same way. Hence our conclusion on the relative performance gain is not 
ahected. 

In the simulation of the psendo-fermion action Sp we hnd the average iteration 
numbers of the conjugate gradient solver to be N 2 = 121.8(3) at p = 0 and 
Ni = 27.278(3) and N 2 = 122.1(3) for p = 0.348. Note that here we have used 
the same stopping criterion for the calculation of the “force” as for the calculation 
of the action at the end of the trajectory. Again N 2 does not depend on p and 
Ni gives the overhead of the simulation with the modihed psendo-fermion action. 
It is difficult to compare the iteration numbers of the CG and the BiGGstab in 
our study, since diherent stopping criteria have been used. Nevertheless, if we 
compare twice the iteration number of the BiGGstab with the iteration number of 
the GG, as we should, we end up with a ratio of / (2iV^*‘^‘^) 1.75. Ghoosing 

more comparable stopping criteria, as was done in ref. , an advantage of 40% 
in favour of the BiGGstab solver can be found. Note however that the BiGGstab 
requires more linear algebra operations per iteration than the GG. E.g. on the 
APEIOO computer this means that the BiGGstab and the GG solver require 
approximately the same GPU-time. 

Next we tested what accuracy of the solver is needed in the calculation of 
the variation of the action to maintain a high acceptance rate. For this purpose 
we have performed runs with 200 trajectories each for the action Sp with p = 0 
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Table 1: Results for extended runs for the 8^ x 24 lattice with (3 = 5.2, k = 0.137 
and Csw = 1.76. We have used either the leap-frog (L) or the Sexton-Weingarten 
improved (S) scheme; ”stat” gives the number of trajectories that have been 
generated. Pace is the acceptance rate. In the runs with the pseudo-fermion 
action Sp, eq. we have averaged miii[l, exp(—Ai7)]. In the case of Sp, 

eq. (1^), we have just counted the accepted configurations. 


scheme 

action 

P 

St 

stat 

P 

^ acc 

L 

eq. (20) 

0.0 

0.025 

6030 

0.793(3) 

L 

eq. (20) 

0.5 

0.04 

8300 

0.770(3) 

S 

eq. (^ 

0.5 

0.1 

6170 

0.883(2) 

S 

eq. (21) 

0.0 

0.06 

2400 

0.83(1) 

s 

eq. (^) 

0.348... 

0.1 

8000 

0.798(9) 


Table 2: Observables for the runs with the action of Sp, eq. (|2^, of the previous 
table. P is the value of the piaquette. Xmin SLnd Xmax are the minimal and 
maximal eigenvalue of Q^. Computing Tint for the piaquette and Xmin we summed 
autocorrelation-functions up to t = 80. 


scheme 

P 

P 

'Tint,P 

^min 

Pnt,X 

^max 

L 

0.0 

0.49474(29) 

25.8(6.0) 

0.00593(4) 

9.3(2.2) 

2.8377(11) 

L 

0.5 

0.49487(22) 

23.2(4.6) 

0.00591(3) 

8.2(1.6) 

2.8381(9) 

S 

0.5 

0.49457(27) 

24.4(5.6) 

0.00596(4) 

9.6(2.2) 

2.8389(10) 


Table 3: Iterations needed by the BiCGstab solver. and are the num¬ 
bers of iterations that were needed to compute Spi and Sp 2 of eq. (|S7|j, respec¬ 
tively, at the end of the trajectory. and are the iteration numbers 

that are needed to compute the vectors V for Spi and Sp 2 of eq. (1^), respec¬ 
tively. 7V*'’*“* is the total number of iterations needed for one trajectory. Tpf is 
the auto-correlation time of the total number of iterations. 


scheme 

P 

j\jacc 

Nacc 

Np 

Ntraj 

tn 

jS^total 

L 

0.0 

- 

60.4(2) 

- 

34.11(11) 

21.1(4.9) 

2777(9) 

L 

0.5 

18.762(8) 

61.2(2) 

11.533(7) 

34.89(9) 

19.4(3.9) 

2362(5) 

S 

0.5 

18.750(13) 

60.9(2) 

11.527(8) 

34.78(12) 

23.1(5.3) 

1922(5) 


16 













































Table 4: Runs with 200 trajectories each for the 8^ x24 lattice, f3 = 5.2, Csw = 1-76 
and K = 0.137. The pseudo-fermion action of eq. ^2Q ) is used. Here we study 
the dependence of the acceptance rate Pace on the stopping criterion of the 
BiCGstab solver, see eq. (ED- 


scheme 

p 

6t 

r:^ 

p 

^ acc 

js^total 

L 

0.0 

0.025 

1 

0.410(27) 

1022(6) 

L 

0.0 

0.025 

0.1 

0.723(20) 

1230(5) 

L 

0.0 

0.025 

0.01 

0.809(17) 

1430(6) 

L 

0.5 

0.04 

1 

0.390(26) 

941(5) 

L 

0.5 

0.04 

0.1 

0.687(22) 

1078(5) 

L 

0.5 

0.04 

0.01 

0.741(18) 

1269(6) 

S 

0.5 

0.1 

1 

0.361(26) 

782(3) 

S 

0.5 

0.1 

0.1 

0.747(19) 

899(5) 

s 

0.5 

0.1 

0.01 

0.863(11) 

1030(4) 


and p = 0.5 and the same step-sizes as for the runs reported in the table and 
various values for Rj^. Our results are summarised in table We observe that 
the dependence of the acceptance rate on R? is much the same for p = 0 and 
for p = 0.5 as well as for the two choices of the integration scheme. Going from 
= 0.1 to 7?^ = 1, the acceptance rate drastically drops. On the other hand, 
for R? = 0.01 we have essentially reached the acceptance rate of R? = 10“® that 
is given in table Q. 


4.3.2 Simulations of the 16^ x 24 lattice 


In order to have a more difficult situation, we went on to the 16^ x 24 lattice 
with K-values that are closer to Kc. In particular, we have performed simulations 
at K = 0.1390, 0.1395 and 0.1398 with the pseudo-fermion action Sp of eq. (pOf ). 


For comparison, we have also simulated the action Sp of eq. (^if) at k = 0.1395. 

Our results for the acceptance rate and the total number of iterations of the 
BiCGstab solver per trajectory for k = 0.1390 and 0.1398 are summarised in 
table For each set of parameters we have generated 100 trajectories, starting 
from an equilibrated conhguration. Since the integrated auto-correlation time 
of the acceptance rate is very small, we can give meaningful error-estimates for 
this quantity. On the other hand, as we have learned from the longer runs of 
the 8^ X 24 lattice, the integrated auto-correlation time of the total number of 
iterations is much larger. Hence we do not quote error-bars for and consider 
pjtotai only an indication for the relative CPU-costs of the simulations. 

Let us start the discussion by testing the effect of the stopping criterion R^ 
of the BiCGstab solver on the acceptance rate. For the leap-frog scheme with 
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p = 0.5 and k = 0.1390 the acceptance rate for = 0.01 is almost the same as 
for = 0 . 001 , while it drastically drops as we farther relax the stopping criterion 
to = 0.1 and R^ = 1.0. In order to keep the effect of R^ on the acceptance 
rate small we have chosen R^ = 0.001 or R^ = 0.0001 in the following simnlations 
for K = 0.139. In the case of k = 0.1398 we see a similar dependence of the 
acceptance on R^. Here we have chosen R^ = 0.0001 for most of the simnlations. 

Next, we stndied the dependence of the acceptance rate on the parameter p 
of the modihed psendo-fermion action. For the leap-frog scheme with p 7 ^ 0 we 
have nsed the step-size = 0 . 02 . As can be seen in table the maximnm of the 
acceptance rate is reached at p = 0.5 for k, = 0.1390 as well as k = 0.1398. As for 
the 8 ^ X 24 lattice sX k = 0.137, this maximnm is broad. This means that no hne 
tnning is needed to obtain almost optimal performance. In order to obtain the 
same acceptance rate for p = 0 as for p = 0.5 the step-size has to be halved. The 
performance of the algorithm in terms of iV*"*“* using Sp is shown in £g. |^. We 
selected points from table ^ that have an acceptance rate of about 80% . For the 
case of the leap-frog integrator we hnd a gain in performance by a factor of 1.66 
at K = 0.1390, which becomes 1.79 at k = 0.1398 compared with the standard 
pseudo-fermion action. 

It is interesting to note that in contrast to our discussion in section the 
optimal value of p is the same for k = 0.1390 and k = 0.1398. One explanation 
might be that the low eigenmodes of the fermion matrix become relevant for the 
dynamics of the HMC algorithm with the leap-frog scheme only for k even closer 
to Kq. 

For the case of the Sexton-Weingarten improved scheme, we find, at k = 
0.1398, a gain of a factor of 2.4 in favour of the modified pseudo-fermion action. 
It is also interesting to note from table that in this case the optimal value of 
p decreases as n approaches Kc, following our expectations. The difference in 
the behaviour of the optimal values of p between the two integration schemes 
is not too surprising. For the standard pseudo-fermion action the step-size of 
higher order integration schemes has to be reduced more drastically than for the 
leap-frog scheme (see e.g. ref. |l36|). 

For p = 0 at K = 0.1398 we obtain an acceptance rate of 82% with the Sexton- 
Weingarten improved scheme and St = 0.0166... . This means that for the stan¬ 
dard pseudo-fermion action, the Sexton-Weingarten improved scheme performs 
worse than the leap-frog scheme. On the other hand, the gain due to the modified 
pseudo-fermion action is much larger for the Sexton-Weingarten improved scheme 
than for the leap-frog scheme which explains the gain in performance shown in 

fig- I- 

In table ^ we give results for k = 0.1395 for both pseudo-fermion actions of 
Sp and Sp. Here we have only used the Sexton-Weingarten improved scheme. 
The acceptance rate as well as the step-size for Sp are slightly larger than for 
Sp. Comparing the two modihed pseudo-fermion actions we hnd hence a minor 
advantage for Sp. Clearly, more studies have to be performed to be able to make 
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Figure 2: Performance results in terms of the total number of iterations in 

the BiCGstab solver as a function of p. We denote by (L) the leap frog and by 
(S) the improved integration scheme. 
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a solid statement, comparing the two modified actions. 

In table ^ we give results for the plaquette and for the maximal and minimal 
eigenvalues of Q^. The numbers that are quoted are naive averages over all runs 
listed in table ^ and the runs with the action Sp in table The error-bars are 
naively computed from the fluctuation of the averages of the single runs. Since 
the computation of the smallest and the largest eigenvalue of is rather CPU¬ 
consuming, we had measured it for k = 0.1390 and k = 0.1398 only once at 
the beginning of the run. For k = 0.1395 we measured the eigenvalues after 
each trajectory. Our results for the plaquette for n = 0.1390 and k = 0.1398 
are consistent within the quoted error-bars with those of 
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In the case of 
is, within error- 


K = 0.1395 there is a 2a discrepancy. The average value of \r 
bars, identical for all three values of k that we had studied. On the other hand, 
Xmin is decreasing as Hc is approached. Fitting the three values with the ansatz 


Xmin K) 

we obtain x = 1.85 ± 0.09, where we have fixed Kc = 0.1405. 


(57) 


5 Conclusion and outlook 


In ref. |]^ we suggested that the Hybrid-Monte-Carlo simulation of dynamical 


Wilson fermions can be substantially speeded up by a modification of the pseudo¬ 
fermion action. The basic idea is that the fermion matrix is split up into two 
factors such that each factor has a smaller condition number than the original 
fermion matrix. For each of the factors, a pseudo-fermion field is introduced. In 
ref. we found a speed-up of more than a factor of two for the largest value 
of K, that we simulated. 

In the present paper we have developed the idea of ref. 
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m a more gen¬ 
eral fashion. In addition to the original factorisation we discuss a second that is 


inspired by twisted mass QCD |^. Note that in our approach only one addi¬ 


tional free parameter p appears as compared with the standard HMC simulations. 
Our numerical results show that no fine tuning of the parameter p is needed. 
We argued that the condition number for the modified fermions is changed to 
the square root of the original condition numbers. This bears the potential of 
changing the scaling behaviour of the HMC algorithm when the chiral limit is 
approached. We have also demonstrated that the modified pseudo-fermion ac¬ 
tion can be easily used on top of even-odd preconditioning of the clover-improved 
Wilson fermion matrix. 

In our numerical study we find that for lattice-spacings, quark masses and 
lattice sizes that recently have been used in large scale simulations a reduction 
of the numerical cost of more than a factor of two compared with the standard 
pseudo-fermion action can be achieved. 
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Table 5: Results for the 16^x24 lattice at f3 = 5.2 and Cgw = 1-76 with the pseudo¬ 
fermion action Sp of eq. m- For each set of parameters we have generated 100 
trajectories, starting from an equilibrated configuration. 


K 

scheme 

p 

r :^ 

5 t 

P 

^ acc 

js^total 

0.139 

L 

0 . 

0.001 

0.01 

0 . 82 ( 2 ) 

8300 

0.139 

L 

0.05 

0.001 

0.02 

0 . 69 ( 3 ) 

6600 

0.139 

L 

0.1 

0.001 

0.02 

0 . 75 ( 3 ) 

6200 

0.139 

L 

0.25 

0.001 

0.02 

0 . 78 ( 3 ) 

5400 

0.139 

L 

0.5 

1 . 

0.02 

0 . 12 ( 3 ) 

3200 

0.139 

L 

0.5 

0.1 

0.02 

0 . 68 ( 4 ) 

3700 

0.139 

L 

0.5 

0.01 

0.02 

0 . 77 ( 2 ) 

4500 

0.139 

L 

0.5 

0.001 

0.02 

0 . 81 ( 2 ) 

5000 

0.139 

L 

0.8 

0.001 

0.02 

0 . 77 ( 2 ) 

4900 

0.139 

L 

1.2 

0.001 

0.02 

0 . 69 ( 3 ) 

4100 

0.139 

S 

0.1 

0.0001 

0 . 066 ... 

0 . 61 ( 3 ) 

4400 

0.139 

S 

0.2 

0.0001 

0 . 066 ... 

0 . 75 ( 3 ) 

3970 

0.139 

s 

0.35 

0.0001 

0 . 066 ... 

0 . 78 ( 2 ) 

3700 

0.139 

s 

0.5 

0.0001 

0 . 066 ... 

0 . 80 ( 3 ) 

3600 

0.139 

s 

0.65 

0.0001 

0 . 066 ... 

0 . 76 ( 2 ) 

3500 

0.139 

s 

0.8 

0.0001 

0 . 066 ... 

0 . 68 ( 3 ) 

3400 

0.139 

s 

1.0 

0.0001 

0 . 066 ... 

0 . 60 ( 3 ) 

3500 

0.1398 

L 

0 . 

0.0001 

0.01 

0 . 77 ( 3 ) 

17000 

0.1398 

L 

0.05 

0.0001 

0.02 

0 . 62 ( 3 ) 

11700 

0.1398 

L 

0.15 

0.0001 

0.02 

0 . 75 ( 3 ) 

10700 

0.1398 

L 

0.3 

0.0001 

0.02 

0 . 76 ( 2 ) 

10100 

0.1398 

L 

0.5 

0.1 

0.02 

0 . 53 ( 4 ) 

5800 

0.1398 

L 

0.5 

0.01 

0.02 

0 . 75 ( 3 ) 

7000 

0.1398 

L 

0.5 

0.0001 

0.02 

0 . 78 ( 2 ) 

9500 

0.1398 

L 

1.0 

0.0001 

0.02 

0 . 64 ( 4 ) 

9300 

0.1398 

S 

0 . 

0.0001 

0.02 

0 . 64 ( 4 ) 

16500 

0.1398 

s 

0 . 

0.0001 

0 . 0166 ... 

0 . 82 ( 2 ) 

21200 

0.1398 

s 

0.05 

0.0001 

0 . 066 ... 

0 . 35 ( 4 ) 

7500 

0.1398 

s 

0.1 

0.01 

0 . 066 ... 

0 . 46 ( 4 ) 

5100 

0.1398 

s 

0.1 

0.0001 

0 . 066 ... 

0 . 67 ( 3 ) 

6500 

0.1398 

s 

0.15 

0.0001 

0 . 066 ... 

0 . 72 ( 3 ) 

6800 

0.1398 

s 

0.2 

0.01 

0 . 066 ... 

0 . 60 ( 3 ) 

4700 

0.1398 

s 

0.2 

0.0001 

0 . 066 ... 

0 . 74 ( 3 ) 

6900 

0.1398 

s 

0.4 

0.0001 

0 . 066 ... 

0 . 49 ( 3 ) 

6100 

0.1398 

s 

0.5 

0.0001 

0.04545 ... 

0 . 82 ( 2 ) 

8700 
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Table 6: Results for the 16^ x 24 lattice at k = 0.1395. For the action of eq. ( ^ ) 
we have chosen = 0.001 as stopping criterion of the BiCGstab solver. In all 
cases, we have used the Sexton-Weingarten improved scheme. Each of the runs 
with the action ofeq. consists of 100 trajectories while the run with the action 


of eq. ( 1^) consists of 380 trajectories. We have used a step-size of 5 t = 0.057 
and 17 steps per trajectory, he. the trajectory length is not exactly equal to 
1. In the case of the action eq. &, we computed the acceptance rate Pace by 
averaging mm[l, exp(—Ai7)], while for the action ofeq. ( ^) we just have counted 
the accepted configurations. Ni and N 2 give the number of iterations needed by 
the solvers for the computation of the force due to Spi and Sf 2 , respectively. 
Note that the BiCGstab has to be applied twice, while the CG is only applied 
once to compute the force. 


action 

P 

St 

P 

^ acc 

iVi 

N2 


2C 

0.05 

0.066... 

0.40(3) 

30.2 

70 


20 

0.2 

0.066... 

0.78(3) 

15.4 

67 


20 

0.5 

0.066... 

0.66(3) 

9.2 

61 

21 

0.2236 

0.057 

0.75(3) 

31.25 

223 


Table 7: Results for the observables from the simulations of the 16^ x 24 lattice. 
For comparison we give the result for piaquette obtained in ref. m/ (Pukqcd)- 
A discussion of the error-bars is given in the text. 


K 

Pukqcd 

P 

^max 

^min 

0.1390 

0.1395 

0.1398 

0.51593(14) 

0.52196(9) 

0.52477(12) 

0.51602(12) 

0.52174(9) 

0.52466(11) 

2.95(2) 

2.960(4) 

2.95(2) 

0.00110(4) 

0.000513(7) 

0.00028(2) 
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One interesting observation in our study is that the partially improved scheme 
of Sexton and Weingarten does proht more from the modihcation of the pseudo¬ 
fermion action than the leap-frog scheme. In fact, this is not too surprising, 
since higher order schemes have more problems in the limit k ^ Kc than the 
leap-frog scheme. This is nicely studied in and also found in recent large 
scale simulations [jl^. Based on our experience it would be interesting to study 
genuinely higher order integration schemes combined with the modihed pseudo¬ 
fermion action. 

There are a number of directions in which the present work could be extended. 
Our general framework (|T^) allows for a large variety of factorisations of the 


fermion matrix beyond our two choices of eqs. (^OD and (|2l|) . For example, one 
could divide the lattice in sub-lattices and construct the matrix W by eliminating 
all hopping terms from Q that connect different sub-lattices. Such a construction 
might be particularly useful for a massively parallel computer. 


An obvious idea is to enlarge the number of factors in eq. (p!5[) . This might be 
particularly advantageous, when we go to light quark masses. In order to keep 
the condition number of the factors of the fermion matrix constant as the quark 
mass becomes lighter, more factors have to be introduced. 

A related modihcation of the HMC algorithm has been proposed and tested in 
refs. 
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The hrst difference compared with our approach is the factorisation 
of the fermion matrix: 

M = P{M)-^ , (58) 

where P{M) is a low order polynomial approximation of M~^. The order of the 
polynomial plays a similar role as the parameter p in our approach. The use of the 
polynomial might allow for a better separation of the UV-part of the spectrum 
than our factorisation. On the other hand, the polynomial P{M) requires the 
introduction of an auxiliary held for each order of the polynomial. 

In contrast to us, the authors of ref. ^ put the two contributions of the 
pseudo-fermion action on diherent time scales of the integration scheme. In our 
case it would also be possible to put 5^1 on the same time scale as Sq- In fact, 
preparing ref. we have tested this idea. However we found no advantage 


compared with our present setting. In order to keep the algorithm as simple as 
possible, we did not discuss this possibility here. 

An interesting idea is to apply the idea of refs. EH to the Polynomial- 
Hybrid-Monte-Carlo algorithm [|4, he. to use a polynomial approximation 
for both parts of the pseudo-fermion action: 


Sfi = \Pi(M)<l>P , Sf2 = \P2(M)h\ 


(59) 


where now Pi{M) is a rough, low order approximation of M~^ and the product 
Pi{M)P 2 {M) is an accurate approximation of M~^. This idea is particularly 
appealing since it could well be applied to the simulation of three flavour QCD. 
(See ref. and refs, therein). 
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